Back

Statistics in Medicine

Wiley

Preprints posted in the last 7 days, ranked by how well they match Statistics in Medicine's content profile, based on 34 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Estimating COVID-19 Cumulative Incidence from Seroprevalence Surveys accounting for Time-Varying Seroreversion: A Fully Bayesian Methodology

Owusu-Boaitey, N.; Meyer, M. J.; Herrera-Esposito, D.; Bottcher, L.; Lukz, M.; Cook, S.; Stoto, M. A.; Kraemer, J. D.

2026-06-10 epidemiology 10.64898/2026.06.09.26355264 medRxiv
Top 0.1%
4.9%
Show abstract

Seroprevalence surveys reveal the extent of humoral immunity against pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and under some circumstances represent cumulative incidence of prior infection. However, antibody waning - or seroreversion - biases these estimates by reducing assay sensitivity in a time-varying manner. Because assay sensitivity decays over time, naively using serosurveys can substantially bias estimates of SARS-CoV-2 cumulative incidence and fatality rates. The Bayesian assay-specific, time-varying sensitivity adjustment developed in this paper can reliably correct for this bias and account for the delay between infection and serosurvey. In seroprevalence studies conducted in the United States in 2020, adjusting for time-varying sensitivity increased cumulative incidence by up to 1.4-fold, with an adjustment of 1.08 for a national study. Our estimates contrast with a previously published 2-fold adjustment that did not account for assay design. This suggests that previous analyses overestimated cumulative incidence by applying seroreversion corrections that did not account for assay-specific effects, or underestimated cumulative incidence by not applying seroreversion corrections. These biases imply fatality rate underestimation and overestimation, respectively. Our model provides a framework for design-specific time-varying sensitivity corrections in seroprevalence surveys for other pathogens.

2
A New Mixed Frequency Regression Model For Environmental Epidemiology

Shukla, N.; Bartington, S. E.; Hansell, A. L.; Lucas, T. C.

2026-06-04 epidemiology 10.64898/2026.06.03.26354801 medRxiv
Top 0.1%
3.1%
Show abstract

Background: In the absence of high-resolution response data, exposure-response modelling often relies on aggregated low-frequency exposure data, leading to loss of high-resolution information. Mixed Data Sampling (MIDAS) from econometrics offers an alternative but is limited due to its inability to make high-resolution predictions, inflexible likelihoods and penalised nonlinear functions, and limited visualization options. We propose a mixed-frequency Distributed Lag Non-linear Model (mf-DLNM) which can eliminate the need to aggregate exposure data in environmental epidemiology and provide high resolution predictions for time series studies. Methods: We evaluated the inference and predictive performance of the mf-DLNM. To evaluate its ability to estimate exposure-response relationships, we applied mf-DLNM and same-frequency (sf)-DLNM using data from the West Midlands, UK. Additionally, we compared the predictive performance of mf-DLNM with sf-DLNM and MIDAS across nine regions of England. As MIDAS cannot predict at the resolution of the predictor (daily), we compared the predictive performance of mf-DLNM and MIDAS at weekly resolution. To test the model's ability to predict high temporal resolution risk (daily), we compared sf-DLNM (with access to daily mortality counts) with mf-DLNM (with access only to weekly mortality counts). Results: In the West Midlands example, mf-DLNM performed comparably to sf-DLNM in estimating daily risk of temperature on respiratory mortality. Furthermore, mf-DLNM and MIDAS exhibited similar performance for weekly predictions. For high-resolution predictions, mf-DLNM and sf-DLNM showed nearly similar performance, despite mf-DLNM having access only to low-resolution response data. Conclusion: This mixed-frequency approach in environmental epidemiology overcomes the limitations of predicting health risks using aggregated exposure data and provides estimates of high-resolution outcomes in the absence of high-frequency health outcome datasets.

3
Direct and mediated effects (DME) SLCMA: a novel method for life course modelling with time-varying covariates

Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.

2026-06-06 epidemiology 10.64898/2026.05.29.26354427 medRxiv
Top 0.2%
1.7%
Show abstract

Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.

4
A risk-of-contagion index using a Bayesian based model for the COVID-19 epidemic in Mexico

Corona-Moreno, R.; Acuna-Zegarra, M. A.; Santana-Cibrian, M.; Velasco-Hernandez, J. X.

2026-06-10 health policy 10.64898/2026.06.09.26355274 medRxiv
Top 0.6%
0.4%
Show abstract

During the COVID-19 pandemic, limited testing capacity and reporting delays complicated epidemic surveillance and decision-making in Mexico. We calibrated \textit{covidestim}, a Bayesian nowcasting model, to estimate the total SARS-CoV-2 infections from reported cases and deaths using Mexican surveillance data. Disease-progression distribution priors were calibrated using Mexico City records and validated through comparisons with national seroprevalence surveys, hospitalization data, and annual reported severe-case rates across all states. Using the reconstructed estimates of active infections, we implemented an event-based risk framework that quantifies the probability of encountering at least one infectious individual in gatherings of different sizes. This probability was subsequently translated into a four-level epidemiological traffic-light indicator and computed at both state and municipality levels. The resulting estimates revealed substantial spatial heterogeneity that is obscured by state-level aggregation, particularly in states with marked differences between urban and rural municipalities. To evaluate consistency with public-health indicators, we compared the proposed risk classification with the official Mexican epidemiological traffic-light system, considering interpretable gathering sizes relevant to public-health decision making. Weekly reports derived from this framework were delivered to policymakers in the State of Queretaro in Mexico, as an anticipation tool for school reopening and public-space management. This demonstrates that this Bayesian reconstruction of infections combined with event-based risk metrics can provide an interpretable and generalizable municipality-level complement to routine surveillance systems, particularly in regions with limited testing capacity and heterogeneous local transmission dynamics.

5
Exploratory Assessment of Pulsed-Wave Doppler Representations of Lung Sounds Using Deep Learning: An In-Vitro Phantom Study

Saad, A. A.; Murthi, S. B.; Boctor, E. M.; Teeter, W. A.; Seam, N.

2026-06-10 respiratory medicine 10.64898/2026.06.09.26353787 medRxiv
Top 1%
0.2%
Show abstract

The increasing availability of portable ultrasound systems motivates exploration of novel approaches to respiratory signal assessment. In this in-vitro study, we investigate whether pulsed-wave (PW) Doppler ultrasound can capture structured spectral patterns from replayed lung sound recordings. Digitized respiratory sounds were replayed through a tissue-mimicking ultrasound phantom, generating 1,478 PW Doppler spectral images from recordings associated with healthy subjects and several externally labeled disease categories. Exploratory classification experiments using a ResNet-18 architecture demonstrated that these Doppler representations contain learnable differences under controlled conditions. These findings motivate further investigation into PW Doppler as a potential representation of respiratory acoustics.

6
KESOZI Digital Twin: Physics-Informed Neural Network for Independent Estimation and Prediction of Childhood Diarrheal Disease Burden in Kenya, Somaliland, and Zimbabwe

KESOZI Digital Twin, ; Agumba, J. O.; Namusonge, L.; Ogendo, J.; Hassan, M. A.; Pembere, A.; Takavarasha, M.

2026-06-04 epidemiology 10.64898/2026.06.03.26354823 medRxiv
Top 1%
0.2%
Show abstract

Childhood diarrheal disease remains a leading cause of morbidity and mortality among children under five years in sub-Saharan Africa, particularly in settings affected by inadequate sanitation, climate variability, malnutrition, and limited healthcare access. Conventional forecasting approaches are often constrained by sparse surveillance data, weak spatial representation, and limited incorporation of mechanistic disease dynamics. This study presents a Physics-Informed Multimodal Artificial Intelligence Digital Twin framework that integrates Physics-Informed Neural Networks, Graph Neural Networks, diffusion-reaction epidemiological modeling, multimodal fusion learning, and Digital Twin simulation to estimate and predict childhood diarrheal disease burden in Kenya, Somaliland, and Zimbabwe. Using public epidemiological, environmental, climate, sanitation, and synthetic proof-of-concept datasets, the framework modeled temporal disease dynamics, spatial transmission, pathogen-attributed burden, and outbreak trajectories while enforcing epidemiological consistency through physics-informed optimization. Results demonstrated robust forecasting performance, enhanced spatial transmission modeling, uncertainty-aware predictions, and realistic outbreak simulations across the three countries. Rotavirus, Shigella, and Cryptosporidium were identified as major contributors to modeled mortality burden, while unsafe water exposure, poor sanitation, malnutrition, and climate-sensitive transmission substantially increased disease risk. Compared with a Bayesian baseline model, the multimodal framework achieved superior nonlinear risk characterization, geospatial learning, and temporal prediction. These findings highlight the potential of scientific machine learning and digital twin systems for infectious disease surveillance, outbreak forecasting, climate-health analytics, and evidence-based public health decision-making in low-resource African settings. Keywords: Physics-Informed Neural Networks, Graph Neural Networks, Digital Twin, Childhood Diarrheal Disease, Epidemiology, Kenya, Somaliland, Zimbabwe, Scientific Machine Learning, Spatial Epidemiology, Multimodal Fusion

7
Behavioral and Functional Neuroimaging Effects of Delivering a Course of Repetitive Transcranial Magnetic Stimulation to Personalized Targets Within the Ventrolateral Or Dorsolateral Prefrontal Cortex in Treatment-Seeking Participants with Cannabis Use Disorder

McCalley, D.; Wong, B.; Geoly, A.; Struckman, W.; Azeez, A.; Kaloiani, I.; Kim, B.; Ninomiya, S.; Ehrie, J.; Austelle, C. W.; Rolle, C. E.; Kim, J. P.; Froeliger, B.; McRae-Clark, A. L.; Sahlem, G.

2026-06-10 addiction medicine 10.64898/2026.06.08.26355193 medRxiv
Top 1%
0.2%
Show abstract

Background: Repetitive Transcranial Magnetic Stimulation (rTMS) is a promising treatment across addictive disorders including Cannabis Use Disorder (CUD). Stimulation of two rTMS-targets, the ventromedial prefrontal cortex (vmPFC) and the left dorsolateral prefrontal cortex (LDLPFC), limbic and executive control network hubs respectively, may yield differential effects. In this pilot trial, we explored the differential effects of 36-sessions of rTMS applied to either the vmPFC or LDLPFC. Methods: Treatment-seeking participants with moderate or severe CUD (n=20, 10F, age=33.3+9.8SD) were randomized to 36-sessions of open-label rTMS (two sessions-per-visit, two or three visits-per-week) to either the LDLPFC (3000-pulses; 10Hz) or vmPFC (900-pulses; 1Hz) using personalized functional Magnetic Resonance Imaging (fMRI) targets along with three-sessions of Motivational Enhancement Therapy. At baseline and following rTMS, the Time-Line Follow-Back was used to measure Days-per-week of cannabis use and the fMRI Regulation of Craving (ROC) task was used to measure network activation to cues associated with long-term negative ('Later') and short-term positive ('Now') consequences of cannabis use. Results: Eighty percent of participants completed study-rTMS. There was a significant decrease in days-per-week of cannabis use in both groups (vmPFC: d=7.9; DLPFC, d=3.1) between the four-weeks of baseline and seven-weeks of follow-up. LDPFC-rTMS reduced fMRI BOLD signal magnitude and increased LDLPFC functional connectivity in response to cues, while vmPFC-TMS reduced functional connectivity. Conclusions: Treatment-seeking participants with CUD reduced the number of days-per-week they used cannabis when receiving rTMS applied to either the LDPFC or vmPFC, while fMRI effects differed by treatment target. Future larger sham-controlled trials are needed for efficacy and biomarker determination.

8
Compositional microbiome-based signatures associate with general health status: findings from a large population-based cohort study

Pujolassos, M.; Kurilshikov, A.; Weersma, R. K.; Yang-Fu, J.; Zhernakova, A.; Calle, M. L.

2026-06-04 epidemiology 10.64898/2026.06.03.26354796 medRxiv
Top 1%
0.1%
Show abstract

While microbiome is increasingly recognized as crucial for human health, translating this knowledge into effective healthcare and preventive strategies remains challenging. Many studies focus on identifying changes in microbiome composition associated with disease and evaluating the potential of such disease-associated microbial profiles as biomarkers for disease diagnosis. Under the hypothesis that microbiome dysbiosis may reflect physiological alterations present long before disease onset, in this work, we analyse the potential of disease-specific microbial signatures not as a diagnostic tool when the disease is already present, but as a means of health assessment in the general population. Moreover, instead of trying to define a single health measure, we believe it is necessary to consider several ways in which the microbiome departs from health, according to different disease-related physiological changes. To evaluate our assumptions, we designed a two-stage study: the identification of disease-specific microbial signatures (discovery stage) and, subsequently, the study of their distribution in the general population to assess associations with general health (external validation stage). Specifically, in the discovery phase we characterized 16 disease-specific bacterial signatures from large public microbiome data using a compositional data analysis methodology. In the second phase, we quantified these microbial signatures in the Lifelines-DMP cohort, a large population-based cohort, and evaluated their association with self-reported health status. Results indicate that most disease-specific microbial signatures associate with health status, supporting our assumption that microbial composition can capture physiological alterations before disease onset, and highlighting the importance of considering multiple ways in which microbiome departs from a healthy state. These findings reaffirm the potential of microbial information as an additional tool in preventive medicine.

9
Positioning Early Phase CNS Trials for Regulatory and Investor Success: Strategic Implications of the Single Phase 3 Approval Paradigm

Schmidt, P.; Preskorn, S.

2026-06-08 neurology 10.64898/2026.06.05.26353604 medRxiv
Top 1%
0.1%
Show abstract

In February 2026, the FDA announced that a single pivotal phase 3 (P3) trial would become the new default standard for drug approval - a regulatory direction that had been legally enabled since the FDA Modernization Act of 1997. This announcement has strategic, scientific, and economic implications for drug developers, contract research organizations (CROs), and biotech investors. We argue that the expansion of this framework, originally reserved for various niche submissions, represents a paradigm change, dramatically increasing the value of rigorous early phase (P1 and P2) trial design, requiring sponsors to establish both statistical efficacy signals and mechanistic biological understanding before entering phase 3. Using a CNS indication cost model, we show that single P3 approval can reduce total development expenditure from approximately $447 million over 14 years to $297 million over 12 years - a savings of $150 million and providing two years of additional commercial runway for a modeled CNS drug. Case examples including lecanemab, omaveloxolone, and tofersen illustrate how biomarker-informed early phase strategies can establish the confirmatory evidence necessary for single-trial approval. We provide practical guidance for maximizing the value of P1 and P2 under this evolving framework.

10
Optimisation of steatotic liver disease screening algorithm for resource-poor settings using machine learning

Mettananda, C.; Sivasumithran, K.; Ranaweera, L.; Madhubhashini, A.; Ranawaka, C.; Pathmeswaran, A.; Dassanayake, A.

2026-06-10 endocrinology 10.64898/2026.06.09.26355306 medRxiv
Top 2%
0.1%
Show abstract

Background The European Association for the Study of the Liver (ESAL) - Steatotic Liver Disease (SLD) screening algorithm involves two steps; initial screening with FIB-4 followed by referral for vibration-controlled transient elastography (VCTE) in patients likely to have significant fibrosis (SF). However, VCTE is not widely available in resource-limited settings. Aim To optimise the EASL SLD screening algorithm for resource-poor settings using machine learning (ML). Methods We analysed data from 964 adults aged [≥]35 years who underwent VCTE at a tertiary referral centre in Sri Lanka between November 2024 and 2025. Multiple ML models using different methods and variable combinations were trained on 80% of the dataset and tested on the remaining 20%. Best models were selected based on performance and externally validated using data from 430 patients who underwent VCTE before November 2024. Model performance was compared with the FIB-4 using confusion matrices. Results A Random Forest model incorporating age, AST, ALT, and platelet count separately, rather than using FIB-4, outperformed. The all-variable ML model showed the best predictive performance for SF, with accuracy of 77.2%, recall of 0.762, precision of 0.778, and AUC-ROC of 0.818. The variables used in the model, in descending order of feature importance, were AST, platelet count, BMI, ALT, age, diabetes mellitus, hypertension, dyslipidaemia, sex, family history, hypothyroidism, diabetes complication and smoking. External validation demonstrated 75.1% accuracy and an AUC of 0.779. When used as the first step of the SLD screening algorithm, the all-variable ML model identified 37 (17.1%) additional true positives and reduced false-negative diagnoses by 50% compared with FIB-4. Conclusions ML-based models were more effective than the FIB-4 score as the first-line screening tool for VCTE referral, substantially improving the identification of patients with significant fibrosis in this South Asian cohort.

11
Local Influenza Forecasts Outperform State-Level Forecasts in the United States

Kim, D.; Pasco, R.; Johnson, K. E.; Fox, S. J.; Reich, N. G.; Meyers, L. A.

2026-06-08 infectious diseases 10.64898/2026.06.04.26354836 medRxiv
Top 2%
0.1%
Show abstract

Accurate outbreak forecasts are critical for timely and effective public health response. In the United States, however, most forecasts are produced at the state level, which can mask substantial sub-state heterogeneity and limit their utility for local planning. We generated and evaluated forecasts of the percentage of Emergency Department visits attributable to influenza across 173 large metropolitan Health Service Areas (HSAs) using a gradient boosting quantile regression (GBQR) model, and compared their accuracy to forecasts derived from state-level data alone. At a one-week, two-week and three-week horizon, local forecasts outperformed state-based forecasts in 98.8%, 90.8%, and 78.6% of HSAs, respectively, achieving mean weighted interval scores that were on average a 39.2% lower (95% range: 5.9% to 76.7%), 19.6% lower (-6.3% to 59.5%) , and 11.4% lower (-11.7% to 44.9%), respectively. The performance advantage of local forecasting was strongest in HSAs representing a smaller share of their state's population and increased with the proportion of the HSA population living in urban areas and the number of metropolitan areas within a state. These results, based on an analysis of HSAs with populations greater than 250,000, demonstrate that fine-scale modeling can substantially improve forecast accuracy and highlight the potential value of local forecasts for outbreak preparedness and response.

12
Revisiting Plasmodium vivax molecular correction

Taylor, A. R.; Foo, Y. S.; White, M. T.

2026-06-04 infectious diseases 10.64898/2026.06.02.26354709 medRxiv
Top 2%
0.1%
Show abstract

Background: Reliable inference of Plasmodium vivax recurrence states - relapse, recrudescence and reinfection (the ``3Rs'') - improves estimates of antimalarial efficacy. The R package Pv3Rs features a Bayesian model designed for P. vivax molecular correction, i.e., using parasite genetic data to infer recurrence states. The model is an extension of a prototype built to analyse microsatellite data from the Vivax History (VHX) and Best Primaquine Dose (BPD) trials. Methods: We re-analysed data from 212 VHX and BPD trial participants (493 recurrences) using Pv3Rs, comparing results with those from the prototype and with genetic relatedness estimated using Dcifer, a tool for estimating relatedness based on identity-by-descent. Posterior recurrence state probabilities were computed using both uniform and time-to-event priors: artificial but equal prior probabilities facilitate posterior interpretation, while time-to-event priors leverage all available information and enable re-computation of failure rates. Relatedness estimates were used to identify and correct instances of model misspecification. Results: The Pv3Rs model generated posterior probabilities for all recurrences and was able to jointly model data on all episodes per participant for 89% of participants, compared with 73% using the prototype. Recurrence state probabilities were broadly consistent across methods, though the Pv3Rs model elevated reinfection probabilities slightly. Relatedness estimates exposed various outliers consistent with half-sibling parasites and/or genotyping errors. Outlier correction impacted some per-participant failure probabilities, but reinfection-adjusted radical-cure failure rates of high-dose primaquine remained near 3%, in line with previous findings. Conclusion: Re-analysis of VHX and BPD P. vivax genetic data restates earlier reinfection-adjusted efficacy estimates. It demonstrates the increased computational capability and misspecification sensitivity of Pv3Rs, highlighting a need for careful analyses. Using relatedness-based diagnostics alongside model-based inference, we were able to harness the advantages of model-based inference and provide a framework for future P. vivax molecular correction.

13
Adapting a Regulation of Craving Magnetic Resonance Imaging Task to Generate Functional Repetitive Transcranial Magnetic Stimulation Targets for the Ventromedial and Dorsolateral Prefrontal Cortex in Treatment-Seeking Participants with Cannabis Use Disorder

Geoly, A.; McCalley, D. M.; Struckmann, W.; Azeez, A.; Wong, B.; Kim, B.; Ninomiya, S.; Ahmed, S.; Kim, J. P.; McRae-Clark, A. L.; Froeliger, B.; Sahlem, G. L.

2026-06-06 addiction medicine 10.64898/2026.06.04.26353616 medRxiv
Top 2%
0.1%
Show abstract

Background: Repetitive Transcranial Magnetic Stimulation (rTMS) is a promising treatment across addictive disorders including Cannabis Use Disorder (CUD). Targeting incentive-salience circuitry via the ventromedial prefrontal cortex (vmPFC) and central-executive circuitry via the left dorsolateral prefrontal cortex (LDLPFC) are both promising treatment approaches; however, to date structural targets have predominated whereas functional targeting may allow for more precision. In this pilot trial we adapted a functional Magnetic Resonance Imaging (fMRI) Regulation of Craving (ROC) task to generate fMRI-based rTMS targets in the vmPFC and LDLPFC. Methods: We recruited treatment-seeking participants with moderate or severe CUD as a part of an open-label trial and administered an adapted ROC-task during fMRI following 24-hours of cannabis abstinence. We identified sub-portions of maximal activation of the LDLPFC when participants thought of long-term consequences of cannabis use (Later) and of the vmPFC when participants thought of short-term positive aspects of cannabis use (Now). We hypothesized that our task would generate acceptable rTMS targets in >66% of baseline fMRI scans. Results: A total of 20-participants enrolled in the trial (50%F, age=33.3+9.8) and completed the baseline fMRI. The adapted ROC-task elicited group level activation in the LDLPFC and precuneus in the Later>Now and in the bilateral vmPFC, ACC, and striatum in the Now>Later contrast. Acceptable functional targets resolved in both the vmPFC and LDLPFC in 19 of 20 participants (one participant did not tolerate MRI). Conclusions: The adapted ROC-task elicits activation in incentive salience and central executive circuitry and can feasibly generate rTMS targets when using a cluster selection algorithm.

14
Formalising Limits of Circulating Tumour DNA Detection: A Signal Detection Framework for Clinical Threshold Specification

Walinjkar, A.

2026-06-10 oncology 10.64898/2026.06.08.26355204 medRxiv
Top 2%
0.1%
Show abstract

Background: Circulating tumour DNA (ctDNA) liquid biopsy is now established across oncology for early cancer detection, minimal residual disease surveillance, and treatment monitoring. Detection thresholds for all current ctDNA assays are derived empirically through receiver operating characteristic analysis on training cohorts - a statistically valid but theoretically uninformed approach that does not specify the minimum detectable tumour fraction given assay technical characteristics, nor identify when increasing sequencing depth ceases to provide additional clinical information. Methods: We model ctDNA detection as a binary hypothesis testing problem with Binomial-distributed mutant allele counts against a sequencing error noise floor. The Neyman-Pearson lemma is applied to derive the uniformly most powerful detector and the minimum detectable tumour fraction in closed form. The sequencing assay is modelled as a binary symmetric channel and Shannon channel capacity is calculated. Empirical validation uses n=61 data points extracted from five published peer-reviewed analytical validation studies across five independent institutions in the US and EU (2018 - 2025): Yu et al. 2022, Stetson et al. 2018, Frydendahl et al. 2023, Northcott et al. 2024, and Cheng et al. 2025. Results: The minimum detectable tumour fraction is derived in closed form as f_min approximately equal to (z_alpha + z_beta) multiplied by the square root of (epsilon divided by N), where N is sequencing depth, epsilon is the platform error rate, and z_alpha, z_beta are standard normal quantiles at the specified false positive and false negative rates. Shannon channel capacity is C = 1 minus H(epsilon) bits per read, where H(epsilon) is binary entropy. Empirical validation yields 84.3% agreement for single-locus assays. Discordance for multi-locus tumour-informed assays (NeXT Personal, duplex WGS) is consistent with the single-locus model scope and identifies the principal theoretical extension required. Conclusions: This framework provides the first formal Neyman-Pearson optimality proof for ctDNA detection, a closed-form detection limit, and a platform-independent efficiency metric for NHS and regulatory standardisation. Keywords: circulating tumour DNA; liquid biopsy; Neyman-Pearson detection; Shannon channel capacity; sequencing depth; limit of detection; minimal residual disease; signal detection theory

15
EMOD with Full Parasite Genetics: A modeling framework for evaluating parasite genetic metrics for operational malaria molecular surveillance

Ribado, J. V.; Suresh, J.; Bridenbecker, D.; Russell, J. R.; Lee, A.; Wenger, E.; Chabot-Couture, G.; Proctor, J. L.; Battle, K. E.; Bever, C. A.

2026-06-08 public and global health 10.64898/2026.06.05.26355027 medRxiv
Top 2%
0.1%
Show abstract

Malaria molecular surveillance (MMS) is becoming increasingly common in endemic settings and has been proposed as a tool for monitoring parasite transmission to inform programmatic decision-making. However, the conditions under which parasite genetic metrics provide interpretable signals for broader use cases, such as assessing intervention impacts and detecting importation, remain under-characterized. We present EMOD with Full Parasite Genetics (FPG), a simulation framework designed to explore how parasite genetic metrics arise from transmission, intervention, importation, and sampling processes at programmatically relevant timescales. Using seasonal scenarios across a range of transmission intensities, we demonstrate three principal findings. First, genetic metrics can detect insecticide-treated net intervention impacts at seasonal and yearly timescales, but the strength, timing, and form of the relationship between genetic and epidemiological measures vary by metric and sampling timing. Second, importation can break the expected relationship between parasite genetic diversity from local transmission intensity at very low incidence, allowing low-transmission settings with substantial importation to maintain elevated diversity metrics. Third, convenience sampling practices, including sample size, collection timing, and the clinical composition of sampled populations, introduce non-random biases in genetic metric estimation in a way that obscures the true transmission signal. Together, these findings show that parasite genetic metrics can support operational surveillance, but that their interpretation depends on transmission context, importation, metric choice, and sampling design. EMOD FPG provides a framework for evaluating these dependencies in future setting-specific analyses and for guiding the interpretation of parasite genetic data across sites and over time.

16
Disentangling infectiousness and susceptibility by age group using transmission pair data: a study of SARS-CoV-2 household transmission

Leung, K. Y.; Miura, F.; Backer, J. A.

2026-06-05 epidemiology 10.64898/2026.06.04.26354892 medRxiv
Top 2%
0.1%
Show abstract

Background Differential contributions to transmission across age groups have been reported for many respiratory infections, including SARS-CoV-2. They are crucial for estimating the impact of age-specific interventions. Disentangling these age-dependent contributions remains challenging, as they may reflect differences in contact rates, biological susceptibility, or infectiousness. Aim We aim to jointly estimate age-specific per-contact infectiousness and susceptibility and their effect on the impact of age-specific interventions. Methods The age-specific infectiousness and susceptibility were jointly estimated in a Bayesian framework by combining contact data with transmission pair data (who-infected-whom). We applied this approach to 197,840 self-reported household transmission pairs collected in the Netherlands during the COVID-19 pandemic. Using these estimates, we projected the expected impact of school closure and work-from-home measures during the early stages of an epidemic in the absence of other interventions. Results Both infectiousness and susceptibility to SARS-CoV-2 infection were lowest in children aged 0-9 years and highest in adults over 30 years old, with 2- to 4.5-fold differences between these groups. Projected impacts of age-specific interventions indicated that school closures would reduce the reproduction number by 8% or 29% when age-specific susceptibility and infectiousness were or were not considered, respectively. Conversely, working-from-home policies would lead to reductions of 41% with and 20% without age-specific infectiousness and susceptibility. Conclusion Our method enables robust estimation of age-specific infectiousness and susceptibility. Accounting for these age heterogeneities is essential for projecting the impact of age-targeted interventions. Our approach is adaptable to other respiratory infections and can guide more tailored public health responses.

17
Development of Longitudinal, Linked Maternal-Infant Cohorts using the Epic Cosmos Electronic Health Record Dataset

Leonard, S. A.; Dysart, K.; Callahan, A.; Siadat, S.; Zhang, J.; Handley, S. C.; Huybrechts, K. F.; Igbinosa, I.; Bateman, B. T.

2026-06-04 epidemiology 10.64898/2026.06.02.26354757 medRxiv
Top 2%
0.0%
Show abstract

Background: Epic Cosmos is a relatively new centralized electronic health record dataset with high potential utility in perinatal epidemiologic research. Objectives: The study objectives were to develop replicable steps to create longitudinal, linked maternal-infant cohorts in Cosmos, assess completeness of key variables, evaluate potential selection bias with restrictions for longitudinal healthcare encounters, and provide an example epidemiologic analysis. Methods: We created maternal-infant cohorts by starting with live births during 2023-2024 recorded in the BirthFact data table and joining with additional data tables as needed. We selected and created variables for perinatal characteristics, common comorbidities, and routinely measured vital signs and laboratory values, and assessed variable completeness. We sequentially restricted the birth cohort for maternal-infant linkage and longitudinal healthcare from first-trimester prenatal care encounter through infant follow-up care within 12 weeks post-discharge from birth hospitalization. Finally, we conducted an example analysis of the association between high systolic blood pressure in the first trimester ([≥]140 mm Hg) and later onset of preeclampsia among those with chronic hypertension. Results: The total linked birth cohort included 2,624,186 pregnancies. Completeness was >90% for most variables assessed but was 77% for racial and ethnic group and 76% for body mass index at delivery. Characteristics of the cohort were similar to those reported for the entire United States birth population based on birth certificate data, including similar regional and racial-ethnic composition. Longitudinal cohort restriction requiring linked records from first trimester prenatal care through infant follow-up care reduced the cohort size to 509,148 pregnancies. However, restriction had minimal effects on cohort characteristics. In the example analysis, high systolic blood pressure was associated with increased risk of preeclampsia among those with chronic hypertension (aRR: 1.26; 95% CI: 1.22, 1.30). Conclusions: This study provides a rigorous and reproducible approach to creating longitudinal, linked maternal-infant cohorts in Epic Cosmos and the analytical findings suggest high data quality and representativeness.

18
Assessing the impact of absence of coordination in malaria intervention strategies: a modelling study

Iggidr, Y.; Ruktanonchai, N. W.; Benhana, B.; Turbe, V.; Bauzile, B.; Ward, A.; Cohen, J.; Pothin, E.; Champagne, C.

2026-06-05 epidemiology 10.64898/2026.06.03.26354857 medRxiv
Top 3%
0.0%
Show abstract

Malaria control programs are increasingly tailored at subnational scales; however, neighboring areas remain connected through human mobility, allowing parasite importation that may undermine independently timed interventions. Although the spatial targeting of control has been the focus of extensive research, the epidemiological consequences of temporal misalignment in intervention deployment across interconnected regions remain to be elucidated. We investigate how asynchronous timing of malaria interventions affects transmission dynamics using a two-patch susceptible-infected-susceptible metapopulation model. We compare synchronous and asynchronous intervention schedules and quantify their impact using measures of excess cumulative incidence attributable to asynchrony. The measure that will be used for this purpose is referred to as Asynchrony Induced Growth (AIG). Across a range of 10,000 parameter combinations, asynchronous implementation has been observed to result in a heightened incidence compared to synchronized deployment, though the impact is typically negligible in most endemic settings. Sensitivity analyses indicate that the impact is most significant when interventions are highly effective, infectious duration is brief, and transmission intensity approaches the elimination threshold. In such circumstances, asynchrony has the potential to substantially inflate case numbers, delay transmission interruption, or even prevent elimination entirely. In illustrative scenarios that reflect realistic settings, synchronizing interventions has been shown to avert large numbers of infections and shorten elimination timelines by years to decades. These findings demonstrate that, beyond spatial targeting, temporal coordination of interventions across connected areas can meaningfully enhance malaria control and elimination. Coordinated timing may be particularly valuable for cross-border or near-elimination programs and should be considered in operational planning and resource allocation.

19
Modeling the Impact of Pediatric RSV Immunization in Massachusetts, 2024--2025

Jones, L.; Ergas, R.; Tibbs, A.; Russo, E. T.; Norville, J.; Bingay, B.; Brown, C. M.; Reich, N. G.; Pasco, R.

2026-06-10 epidemiology 10.64898/2026.06.05.26354236 medRxiv
Top 3%
0.0%
Show abstract

Background Pediatric immunizations for Respiratory Syncytial Virus (RSV), including monoclonal antibodies for infants and vaccines for pregnant people, have become broadly available and can prevent severe RSV outcomes in infants. However, quantifying the impact of RSV immunization in prevention of severe pediatric illness at the population-level is limited by lack of RSV case surveillance data. The Massachusetts Department of Public Health (DPH) conducted a modeling analysis using routine public health surveillance data to estimate the state-level impact of new RSV immunization products on Emergency Department (ED) visits and hospitalizations in Massachusetts for highest risk pediatric groups. Methods A scenario projection tool, called R.Scenario.Vax, was utilized to simulate RSV-associated ED hospital encounters by age group in the context of newly available immunizations. ED visit and hospitalization data from the National Syndromic Surveillance Program (NSSP) during the time period 10/08/2017--10/19/2024 were analyzed, scaled to account for changes in RSV testing practices over time and missing encounter volume in historic data, and utilized to inform model fit of a "typical" RSV season. RSV immunization data from the Massachusetts Immunization Information System (MIIS) for the 2023--2024 and 2024--2025 RSV seasons informed high and moderate pediatric RSV immunization coverage scenarios and their impact was compared to a counterfactual reference scenario of no new immunizations. Median projections were quantitatively and qualitatively compared to observed 2024--2025 season data. Percent reduction in hospital encounters and encounters averted per 10,000 population were calculated for each scenario as compared to the reference. Results Projections for the youngest at-risk age groups showed significantly lower RSV-associated ED visits and hospitalizations during the 2024--2025 season for both high and moderate immunization coverage scenarios. Median projections for infants under 6 months old in the highest coverage scenario, wherein nearly all infants were immunized, showed 72.6% lower ED visits and 73.4% lower hospitalizations when compared to the reference scenario, equating to 262 ED visits and 85 hospitalizations averted per 10,000 population. Conclusions Our results support the use of modeling methods for public health insights and suggest that RSV immunizations for infant populations result in significantly lower RSV-related ED encounters in Massachusetts.

20
A wealth index based on two-component polychoric principal component analysis reduces urban bias and improves socioeconomic classification in low- and middle-income country surveys: a validation study using LSMS surveys

Vidaletti, L. P.; Dos Santos, A. M.; Hellwig, F.; Barros, A. J. D.

2026-06-08 epidemiology 10.64898/2026.06.01.26354245 medRxiv
Top 3%
0.0%
Show abstract

Background: The traditional wealth index, based on principal component analysis (PCA), used in the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS), suffers from urban bias, distorting estimates of health inequality. We compared the traditional index (PEAR1) with an alternative two-component polychoric PCA index (POLY2) using annual expenditure from 12 LSMS surveys as the gold standard to determine which provides more accurate SEP measures for equitable policy targeting. Methods: We compared the traditional wealth index (PEAR1) with a two-component polychoric PCA approach (POLY2) using 12 LSMS (Living Standards Measurement Study) surveys (2015-2022) from 12 African countries. Annual household consumption expenditure was the gold standard. We assessed agreement using weighted Cohen's kappa and validated against education (proportion of households with secondary or higher education) using the concentration index (CIX) and slope index of inequality (SII). Results: The POLY2 index showed higher agreement with expenditure quintiles (average national weighted kappa = 43.3%) than the PEAR1 index (35.1%), with notable improvements in urban (43.5% vs. 27.5%) and rural (35.3% vs. 22.4%) areas. POLY2 also attenuated extreme household distributions observed in PEAR1. Education validation showed that POLY2 produced intermediate inequality gradients between the flatter expenditure-based gradient and the steeper PEAR1-based gradient. Conclusion: The POLY2 wealth index is superior to the traditional index, reducing urban-rural bias and providing more accurate socioeconomic classifications. Its adoption in large-scale surveys such as DHS and MICS is recommended to improve equitable monitoring of health inequalities in low- and middle-income countries.